Long-Term Exploration in Persistent MDPs

نویسندگان

چکیده

Exploration is an essential part of reinforcement learning, which restricts the quality learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, exhaustive exploration environment often impossible, successful training agent requires a lot interaction steps. this paper, we propose method called Rollback-Explore (RbExplore), utilizes concept persistent Markov decision process, in agents during can roll back to visited states. We test our algorithm hard-exploration Prince Persia game, without rewards domain knowledge. At all used levels outperforms or shows comparable results with state-of-the-art curiosity methods knowledge-based intrinsic motivation: ICM RND. An implementation RbExplore be found at https://github.com/cds-mipt/RbExplore.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Long-term persistent fetomaternal hemorrhage

It is not clear that how long the affected fetuses can tolerate fetomaternal hemorrhage (FMH). Incidental serial measurements of the fetal peak systolic velocity of the middle cerebral artery and the retrospective analysis of stocked blood available incidentally indicated that our patient had suffered from FMH for at least 2 weeks prior to delivery.

متن کامل

Exploration-Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first a...

متن کامل

Plants in Long Term Lunar Exploration

IntroductionScience and Exploration Rational: Plant are complex eukaryotic organisms that share fundamental metabolic and genetic processes with humans and all higher organisms, yet their sessile nature requires that plants deal with their environment by adaptation in situ. Thus plants have evolved to deal with environmental change and stress by responding with changes in metabolism in order to...

متن کامل

Autonomous Exploration For Navigating In MDPs

While intrinsically motivated learning agents hold considerable promise to overcome limitations of more supervised learning systems, quantitative evaluation and theoretical analysis of such agents are difficult. We propose to consider a restricted setting for autonomous learning where systematic evaluation of learning performance is possible. In this setting the agent needs to learn to navigate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-89817-5_8